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We define Persistent Mutual Information (PMI) as the Mutual (Shannon) Information 
between the past history of a system and its evolution significantly later in the future. 
This quantifies how much past observations enable long term prediction, which we pro- 
' pose as the primary signature of (Strong) Emergent Behaviour. The key feature of our 

i definition of PMI is the omission of an interval of 'present' time, so that the mutual in- 

formation between close times is excluded: this renders PMI robust to superposed noise 
or chaotic behaviour or graininess of data, distinguishing it from a range of established 
Complexity Measures. For the logistic map we compare predicted with measured long- 
time PMI data. We show that measured PMI data captures not just the period doubling 
cascade but also the associated cascade of banded chaos, without confusion by the over- 
layer of chaotic decoration. We find that the standard map has apparently infinite PMI, 
but with well defined fractal scaling which we can interpret in terms of the relative in- 
formation codimension. Whilst our main focus is in terms of PMI over time, we can also 
apply the idea to PMI across space in spatially-extended systems as a generalisation of 
the notion of ordered phases. 

Keywords: emergence; persistent mutual information; chaotic dynamical systems; com- 
plexity measure; logistic map 



1. Introduction 

Our starting point is the desire to discover and quantify the extent to which the 
future evolution of a dynamical system can be predicted from its past, and from the 
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stand point of Complexity Theory we are interested in assessing this from observed 
data alone without any prior parametric model. 

We should nevertheless admit prior classification as to the general nature of the 
system, and simple constraining parameters such as its size, composition and local 
laws of motion. Given that information, there may prove to be entirely reproducible 
features of its subsequent evolution which inevitably emerge over time, such as 
eventual steady state behaviour (including probability distributions) . This we follow 
[3j [1] and others in terming weak emergence. The emergence is weak in terms of 
there being no choice of outcome, it can be anticipated without detailed inspection 
of the particular instance. 

We focus on Strong Emergence by which we mean features of behaviour signifi- 
cantly into the future which can only be predicted with knowledge of prior history. 
The implication is that the system has made conserved choices not determined by 
obvious conservation laws, or at least nearly conserved choices which imply the 
existence of associated slow variables. 

More formally, we must conceive of an ensemble comprising a probability distri- 
bution of realisations of the system (and its history) , from which observed histories 
are drawn independently. The behaviour of a particular realisation which can be 
anticipated from observing other realisations is weakly emergent, whilst that which 
can only be forecast from the observation of the past of each particular instance 
is strong emergence. A related distinction between weak and strong emergence is 
given in |19) . but with quantification based on a metric on the underlying space, 
rather than purely measure-theoretic. 

2. Persistent Mutual Information 

Within an ensemble of histories of the system we can quantify strong emergence in 
terms of mutual information between past and future history which persists across 
an interval of time r. This Persistent Mutual Information is given by 



where x_o designates a history of the system from far past up to present time 0, x T + 
is the corresponding history of the system from later time t onwards, P[x-o, x T +] 
is their joint probability density within the ensemble of histories, and P[x_o].P[x T +] 
is the product of corresponding marginal probability densities for past and future 
taken separately. If the history variables x(t) are discrete- valued then integration 
over histories is interpreted as summation; in the continuous case J(r) has the merit 
of being independent of continuous changes of variable, so long as they preserve time 
labelling. 

Quantitatively J(r) measures the deficit of Shannon Entropy in the joint history 
compared to that of past and future taken independently, that is 
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J(r) = H [P[x^o]} + H [P[x T+ ]} - H [P[aj_ ,a: T +]] 



(2) 



where the separate Shannon Entropies for a probability density P of a set of vari- 
ables y are given generically by 



Thus it is precisely the amount of information (in Shannon Entropy) about the 
future which is determined by the past, and hence the extent to which the future 
can be forecast from past observations (of the same realisation) . 

The key distinguishing feature of our definition above is the exclusion of infor- 
mation on xo T , that is the intervening time interval of length r. This ensures that 
J(t) is only sensitive to system memory which persists right across time t; any fea- 
tures of shorter term correlation do not contribute. The choice of t must inevitably 
be informed by observation, but the extreme cases have sharp interpretation. 

1(0) corresponds directly to the Excess Entropy as introduced in [12] where it 
was called "Effective Measure Complexity" , and makes no distinction of timescale 
in the transmission of information. This quantity has been discussed in many guises: 
as effective measure complexity in [T2l [17] , "predictive information" in [2] and as 
Excess Entropy in pj [11] , to name but a few. See also [22] [13j [11] for measurements 
of Excess Entropy and the related Entropy Rate on a variety of systems, including 
the logistic map. 

Our sharpest measure of Strong Emergence is the Permanently Persistent Mu- 
tual Information (PPMI), that is the PMI J(oo) which persists to infinite time. 
This quantifies the degree of permanent choice spontaneously made by the system, 
which cannot be anticipated without observation but which persists for all time. A 
prominent class of example is spontaneous symmetry breaking by ordered phases 
of matter: here a physical system is destined to order in a state of lower symme- 
try than the probability distribution of initial conditions, and hence must make a 
choice (such as direction of magnetisation) which (on the scale of microscopic times) 
endures forever. As a result Strong Emergence can only be diagnosed by observing 
multiple independent realisations of the system, not just one long time history. An 
interesting though anomalous case is presented by clock phase, where time shift 
leads to different phases. This is exploited in measuring PPMI for the logistic map 
in the following section (see fig. Q]). 

PPMI corresponds to some partitioning of the attracting dynamics of the system 
into negligibly communicating (and negligibly overlapping) subdistributions. If the 
dynamics evolves into partition i with probability pi then the PPMI is simply given 




(3) 



by 




(4) 



i 



i 



April 2, 2010 1:3 WSPC/INSTRUCTION FILE 



PMI'paper'draft 



4 RC Ball, M Diakonova, RS MacKay 

which is the entropy of the discrete distribution pi. For deterministic dynamics 
each pi is simply determined by the sampling of its associated basin of attraction 
in the distribution of initial conditions, so in this case the PPMI is sensitive to the 
latter. However for stochastic dynamics it is possible for the pi to be predominantly 
determined by the distribution of early time fluctuations. 

3. PPMI in the logistic map 

We consider time series from the logistic map x n+ \ = \x n (l — x n ) as a simplest 
non-trivial example which brings out some non-trivial points. Depending on the 
control parameter A, its attracting dynamics can be a limit cycle, fully chaotic, or 
the combined case of banded chaos where there is a strictly periodic sequence of 
bands visited but chaotic evolution within the bands [5] . 

In the case of a periodic attracting orbit with period T, the choice of phase of 
the cycle is a permanent choice leading directly to positive PPMI. For this case the 
phase separation is in the time domain, so the attractor can be fully sampled by 
shifting the start time of observation. If we assume the latter is uniformly distributed 
over a range of time large enough compared to the period, then the observed phase 
is a result of a uniformly selected and symmetry breaking choice in which each 
phase has pi = 1/T, and this leads to PPMI 



This is a generic result and not special to the logistic map. Because there is just an 
attracting orbit, the Excess Entropy gives the same value. 

For fully chaotic attracting dynamics such as at A = 4, we have to be careful in 
principle about limits. Provided the probability densities are measured with only 
limited resolution 5x in x, then we expect past and future to appear effectively 
independent for r > t(5x) and hence I(r) — > and there is zero PPMI. Thus 
for chaotic motion the associated values are quite different: the Excess Entropy 
is positive and reflects the complexity of its dynamics, whereas the PPMI is zero 
reflecting the absence of long time correlations. 

Both of the above results can be seen in measured numerical data for the logis- 
tic map in fig. [TJ What is more pleasing still is the behaviour of PMI for banded 
chaos, where a T-periodic sequence of bands shows through to give I(oo) > log (T) 
(assuming random initiation phase as before) with equality when the Tth iterate re- 
stricted to one band is mixing. The fact that the numerical results overshoot log (T) 
for many parameter values can be attributed to the presence of a finer partition 
than the T bands, for example around an attracting periodic orbit of period a mul- 
tiple of T or into sub-bands with a period a multiple of T. Even in cases where the 
dynamics really is T-periodic mixing (meaning it cyclically permutes T bands and 
the Tth iterate restricted to one is mixing), and hence PPMI is precisely log(T), 
the numerics might pick up some long-time correlation that does not decay until 



J(oo)=log(T) . 



(5) 
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Fig. 1. Bifurcation diagram (above) and measured Persistent Mutual Information (below) for the 
Logistic Map, as a function of the map control parameter A. For each value of A the map was 
allowed a minimum of 10 5 iterations to settle and then the Mutual Information measured across a 
'time' interval of 10 5 iterations. Each MI measurement used the distance to 4th nearest neighbour 
to estimate probability density (k = 4), based on a sample of TV = 5000 iterate pairs. Before chaos 
sets in PMI increases stepwise in jumps of log 2, reflecting the doubling of the resolved period. 
It is also seen to pick up the band periodicity after the onset of chaos (resolving some more fine 
periods within the band hopping), as well as a nonchaotic period three regime. 



after the computational time interval. One can see this in the more detailed data of 
fig. [2] where there are background plateaux corresponding to the periodicity of the 
observed major bands, decorated by narrower peaks corresponding to higher peri- 
odicities. The steps in PMI are particularly clear where bands merge because these 
special points have strong mixing dynamics within each band cycle 20 . Figures 1 
and 2 for the PPMI can be usefully contrasted with the Excess Entropy graphed in 
figure 1 of [TT] . On the period-doubling side they display the same values (log of the 
Period), whereas in the regions of banded chaos the PPMI picks out the number 
of bands whilst the Excess Entropy is complicated by its sensitivity to short time 
correlations in the chaotic decoration. 
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Fig. 2. PMI for a chaotic region of the control parameter (sample size N = 5000, k = 4, settle 
time 10 10 , time separation 10 6 ). In this range chaotic bands are known to merge (see bifurcation 
diagram in fig. [T] PMI picks out the relevant decrease in overall periodicity as 16 bands merge 
pairwise into eight, four, and finally two at A slightly less than 3.68. PMI also detects a period 
tripling regime, which can be seen around A = 3.63. 



4. Issues measuring PMI 

Measuring Mutual Information and in particular the implied measurement of the 
entropy of the joint distribution suffers from standard challenges in measuring the 
entropy of high dimensional data. The naive 'histogram' method, in which proba- 
bility densities are estimated directly from frequency counts in pre-selected (mul- 
tidimensional) intervals is easy to apply but can require very large sample sizes in 
order to ensure that the significant frequencies are estimated from multiple (rather 
than single) counts. In practice we found the k'th neighbour approach of Kraskov 
et al [16] a more effective tool (from now on referred to as the k-NN method). It 
is more limited in sample size due to unfavourable order of algorithm, but this was 
outweighed by its automatic adjustment of spatial resolution to the actual density 
of points. 

The basis of the k-NN method is to estimate the entropy of a distribution from 
the following estimate of the logarithm of local probability density about each sam- 
pled point: 
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Iog(p) ~ log (j^-) + [¥(fc) - *(JV) - log = -log eg + *(*0 - W (6) 

where iV is the total number of sample points and ef, is the volume of space out 

to the location of the fc'th nearest neighbour of the sample point in question. The 

combination ^4r- in the first logarithm is simply interpretable as an amount of 
£fc . . 

sampled probability in the neighbourhood divided by corresponding volume. In the 
remaining term ^(z) = T'(z)/T(z) ~ log(z) — as \z\ — > oo is the digamma 
function and the whole of this term is only significant for small k, where it corrects 
a slight bias associated with finite sampling of the neighbourhood [16] . 

We interpret log(iV/fc) ~ *&(N) — *&(k) as the (logarithmic) probability resolu- 
tion in these measurements. When N and either k is large or held fixed the variation 
of these two forms is equivalent and we generally show the first for simplicity of 
exposition. The main exception is for entropy data on the standard map below, 
where taking data down to k = 1 significantly enhances our appreciation of scaling 
and the use of the more accurately scaled second form is important. 

Because PMI is invariant under changes of variable, there is considerable scope 
for choice of how to parameterise past and future before feeding into the PMI 
measurement. For the logistic map we exploited its deterministic nature, by which 
one value of x n provides all information about the past up to iterate n which is 
relevant to the future, and similarly all influence of the past on the future beyond 
iterate n' is captured by the value of x n i . Note however that we did not require to 
identify minimalist causal states in the sense discussed in section (5) below. 

For systems without known causal coordinates, the practical measurement of 
PMI has a rich time parameterisation. In principle what we can directly measure is 
the mutual information I{t\, tg; <3, £4) between time intervals [ti,i2] and [£3, £4]. If 
we assume stationarity then this is more naturally parameterised as 7(r; T_, T + ) in 
terms of the intervals T_ = — 1\ and T + = t^—ti of past and future respectively 
as well as the intervening interval r = £3 — ti . Then the full PMI is defined as 



J(r)= lim I(t;T_,T + ). (7) 

T_ ,T + — >oo 

If the PPMI is desired it is computationally efficient to set r — > 00 before taking the 
limits above, because the dimension of space in which entropy must be measured is 
set by T_ + T + alone. By contrast with PMI, the Predictive Information developed 
at length in reference [2] is in the present notation 1(0; T, T). 

Practical measurement of PMI entails some limited resolution, whether explic- 
itly by histogram methods or implicitly through the depth of sampling in k-NN and 
other adaptive methods. This inevitably leads to long periodic orbits being capped 
in their apparent period and hence their measured I(t). We can be fairly concrete in 
the case of measurement by the k-NN method, which looks out across a neighbour- 
hood whose aggregate measure is k/N. The longest period one can thereby detect 
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is of order N/k so we are led to expect J(r) ~ log (min (T, N/k)). One point where 
we can check this quantitatively is the accumulation point of the period-doubling 
sequence [6j[T5l[T0]: fig. |3] shows the measured results agreeing with the expectation 
that I(t) ~ log (N/k). 




log(N/k) 



Fig. 3. PMI of the logistic map at the period-doubling accumulation point plotted against the 
effective resolution with which probability density has been measured, given by log(A r /fc). (N = 
fOOO, 2000, ..4f000 where runs with higher N correspond to darker points; k = 1, 2, 3,4, 5, fO, ..50, 
time separation 10 s , settle time 10 10 ). The periodicity measured is resolution limited, with ap- 
parent overall slope of this plot ca 0.9 in fair agreement with slope unity predicted on the basis of 
resolvable period oc N/k (see text). 



5. Relationship with Statistical Complexity 

Statistical Complexity (in certain contexts equivalent to "True Measure Complex- 
ity" first introduced in [12]; see also 0[21]) is built on the projection of past and fu- 
ture down to optimal causal states S-(t)[x t ~] such that P[x t +\x t -} — P[xt+\S-(t)]. 
In terms of these one readily obtains that the PMI is given by the Mutual Informa- 
tion between time-separated forward and reverse causal states, that is 

J(r) = H [P[5_(t)]] + H [P[S+(t + r)]] - H [P[S-(t),S+(t + r)}} (8) 
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as a straightforward generalisation of the corresponding result for the Excess En- 
tropy 7(0) p2]. 

In general one cannot simplify the above formula (I (_ in a natural extension 

of the notation) to use other combinations of choice between SL or S+. However, 
we conjecture that under fairly general conditions they are all equivalent for the 
PPMI, i.e. r — ► oo. It is an interesting open question whether for general time gap 

t it can be proved that I |_ > (/__, > 1^ , and perhaps also I = J++. 

For t = similar forward and reverse time dependencies for e-machines have been 

considered in [9], where it was noted that in general I =/= an d bidirectional 

machines were defined that incorporate this time asymmetry. 

6. Fractal and Multifractal PMI: example of the standard map 

As we already observed for the accumulation points of the standard map, where 
the attractor of the dynamics is a Cantor fractal adding machine, the measured 
PMI may go to infinity as the resolution increases. The archetypal case of this is 
where the probability measures themselves have fractal support or more generally 
exhibit multifractal scaling. The general phenomenology is that dividing space of 
dimension d into cells c of linear width e of integrated measure /i c = J P(x)d d x, 
the density in a cell is estimated as [i c /e d and hence to leading order as e — > + one 
expects 

H[P] = -^/j c log = d log e -D log e + const (9) 

c 

where D is the information dimension of the integrated (natural) measure /x, defined 
to be 

D = hm ^c^tog^ . (10) 
e ^o+ loge 

Applying this to the PMI through eq. ([2]) then leads to 

7(r) = 7 (r) - (D_ + £>+-£>_+) loge (11) 

where D |_ is the information dimension of the joint distribution of past and future, 

_D_ and D + the information dimensions of the respective marginal distributions, and 
Io (t) the extrapolated resolution- free PMI (note the dimensions of the underlying 
spaces cancel from J(r) because cL + = cL + d + ). 

Applying the equivalent analysis to the k-NN method we have to be careful to 
insist that eq. ([1]) is used, meaning in particular that it is a neighbourhood of k 
neighbours in the joint distribution which is taken to determine the ratio of joint and 
marginal probability densities within the logarithm. With this understanding, loge 
in the above expression for PMI can be written in terms of probability distribution 
-jj^-j- log (k/N) leading to 

I(r) =I (r)+T log (N/k) (12) 
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where 

T= D - +D J- D -+ (13, 

is the relative information codimension. 

Our first check is the accumulation point of period doubling of the logistic map, 
at which the dynamics causally orbits its Cantor set attractor. In this case all the 
information dimensions above are each equal to the fractal dimension of the Cantor 
set, leading to V = 1 in agreement with our earlier observations and interpretation 
based on resolution limited period. Fig. [3] shows the directly measured PMI with 
apparent V ~ 0.9 in fair agreement, where sampling over the attractor set is achieved 
by using different times of measurement. A uniform distribution of measurement 
times approximates uniform sampling of the unique invariant probability measure 
of the attractor. 

The standard map provides a much more subtle test of this phenomenology. 
This two-dimensional map 

P' = (P+ K S in ^)modulo2,r > X' = {X + p')modulo2 7 r ( 14 ) 

is strictly area-preserving in the x,p plane (reduced modulo 2tt) and has uniform 
invariant measure. Thus if we launch this dynamics with random initial conditions, 
the marginal distributions (joint of x,p at fixed iteration) remain strictly uniform 
forever. The joint distribution between distant iterations of the standard map is far 
from simple and uniform, at least for moderate values of the map parameter K. 
Fig. [4] shows the measured PMI as a function of the probability resolution k/N, 
displaying clear fractal behaviour for values of < K < 6. 

The corresponding estimates of the relative information codimension are shown 
in fig. El For the left end point we can anticipate lim^-^o r(-ftT) = 1/3 on theoretical 
grounds, because this limit corresponds to a continuous time Hamiltonian dynamics 
which is energy (Hamiltonian) conserving. The joint distribution can therefore only 

explore D h =4—1 degrees of freedom leading to T = 1/3. Note this result 

depends on the assumption that the shear in the dynamics between close energy 
shells is sufficient to destroy correlation of their in-shell coordinates, and we did 
correspondingly find that we had to use a large number of map iterations for the 
expected behaviour to emerge. 

The apparent peak in T around K = 1 is particularly interesting because this 
is the vicinity of K c where momentum becomes diffusive, closely associated (and 
often identified) with breakup of the golden KAM curve at K g = 0.971635.. [HI 
I18j . Dynamical anomalies have been observed around this critical value of K which 
might underlie the peak we observe in PPMI. On the other hand corresponding long 
time dynamical correlations pose a threat to whether our results are adequately 
converged. 

For larger K our measurements are consistent with T(K) = for K > K\ where 
6 < K\ < 7, and indeed the full PMI is within uncertainty of zero in this regime, 
meaning the map appears fully chaotic to the level we can resolved. 
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4 5 6 7 8 

¥(N)-¥(k) 

Fig. 4. PMI for the standard map at 3000 iterations, as a function of the probability resolution used 
to measure it, for map parameters K = 0.1, 0.5, 0.9, 1., 1.1, 1.2, 1.5, 2, 3, 4, 6 (top to bottom). The 
resolution of probability is plotted as ty(N) — ty(k) ~ Ln(N/k), where N = 3000 is the number of 
sample points used, and k is the rank of neighbour used in the measurement of Mutual Information 
(see text) which ranges from 1 to 64 across each plot. For each map parameter there is a clear 
linear dependence on this logarithmic resolution, consistent with fractal phenomenology and the 
interpretation of the slope as a relative information codimension. The data have been averaged 
over five independent sets of such measurements with the error bars showing the la error in each 
mean, and the drawn lines correspond to the slopes plotted in fig. [5] 



7. Conclusions 

We have shown that Persistent Mutual Information is a discriminating diagnostic 
of hidden information in model dynamical systems, and the Permanent PMI is a 
successful indicator of Strong Emergence. 

The detailed behaviour of the logistic map is sufficiently re-entrant, with peri- 
odicity and cascades of period multiplication intermingled amidst chaos, that we 
are unlikely to have the last word on the full quantitative behaviour of the PPMI 
as a function of map parameter A beyond the first cascade. 

For the standard map, PPMI reveals some of the subtlety only otherwise ac- 
cessible through dynamical properties such as explicit orbits. Precise relationships 
remain an open issue, particularly around critical map parameter K c . The observed 
fractal behaviour with a deficit between the joint information dimension and those 
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Fig. 5. Relative information codimension V for the standard map (at 3000 iterations) as a function 
of map parameter K. These are the slopes of the data shown in fig. [4] and the error bars are 
estimated as 2a based on separate best fit to the five independent runs of data. The intercept at 
K = matches theoretical expectation of 1/3 (see text) and the fall to zero at large K is consistent 
with dominance by chaotic dynamics. It is interesting that T peaks in the vicinity of K g = 0.97 
where the golden KAM curve breaks up, but anomalously slow dynamical relaxation in this region 
(see [4] and references therein) means that the peak may not reflect the limit of infinite iterations 
and hence true PPMI. 



of the marginals is we suggest a general phenomenology. Whether it reflects truly 
fractal and multifractal behaviour in any particular case should rest on a wider 
multifractal analysis of the joint probability measure, which we intend to address 
in later work on a wider range of non-trivial dynamical systems. 

Application to intrinsically stochastic systems and real world data are outstand- 
ing challenges. We can however readily invoke a wide variety of examples associated 
with ordering phenomena in statistical physics where a dynamically persistent and 
spatially coherent order parameter emerges. In these cases there is clearly PPMI in 
time, but also we can consider just a time slice and let one spatial coordinate take 
over the role of time in our PMI analysis. Spin Glasses are a key instance where the 
two viewpoints are not equivalent: these have order and hence Mutual Information 
persisting in time but not in space. 
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